62 research outputs found

    Robust, fuzzy, and parsimonious clustering based on mixtures of Factor Analyzers

    Get PDF
    A clustering algorithm that combines the advantages of fuzzy clustering and robust statistical estimators is presented. It is based on mixtures of Factor Analyzers, endowed by the joint usage of trimming and the constrained estimation of scatter matrices, in a modified maximum likelihood approach. The algorithm generates a set of membership values, that are used to fuzzy partition the data set and to contribute to the robust estimates of the mixture parameters. The adoption of clusters modeled by Gaussian Factor Analysis allows for dimension reduction and for discovering local linear structures in the data. The new methodology has been shown to be resistant to different types of contamination, by applying it on artificial data. A brief discussion on the tuning parameters, such as the trimming level, the fuzzifier parameter, the number of clusters and the value of the scatter matrices constraint, has been developed, also with the help of some heuristic tools for their choice. Finally, a real data set has been analyzed, to show how intermediate membership values are estimated for observations lying at cluster overlap, while cluster cores are composed by observations that are assigned to a cluster in a crisp way.Ministerio de Economía y Competitividad grant MTM2017-86061-C2-1-P, y Consejería de Educación de la Junta de Castilla y León and FEDER grantVA005P17 y VA002G1

    Robust constrained fuzzy clustering

    Get PDF
    It is well-known that outliers and noisy data can be very harmful when applying clustering methods. Several fuzzy clustering methods which are able to handle the presence of noise have been proposed. In this work, we propose a robust clustering approach called F-TCLUST based on an “impartial” (i.e., self-determined by data) trimming. The proposed approach considers an eigenvalue ratio constraint that makes it a mathematically well-defined problem and serves to control the allowed differences among cluster scatters. A computationally feasible algorithm is proposed for its practical implementation. Some guidelines about how to choose the parameters controlling the performance of the fuzzy clustering procedure are also given.Estadística e I

    A fast algorithm for robust constrained clustering

    Get PDF
    The application of “concentration” steps is the main principle behind Forgy’s k-means algorithm and Rousseeuw and van Driessen’s fast-MCD algorithm. Despite this coincidence, it is not completely straightforward to combine both algorithms for developing a clustering method which is not severely affected by few outlying observations and being able to cope with non spherical clusters. A sensible way of combining them relies on controlling the relative cluster scatters through constrained concentration steps. With this idea in mind, a new algorithm for the TCLUST robust clustering procedure is proposed which implements such constrained concentration steps in a computationally efficient fashion.Estadística e I

    Constrained parsimonious model-based clustering

    Get PDF
    Producción CientíficaA new methodology for constrained parsimonious model-based clustering is introduced, where some tuning parameter allows to control the strength of these constraints. The methodology includes the 14 parsimonious models that are often applied in model-based clustering when assuming normal components as limit cases. This is done in a natural way by filling the gap among models and providing a smooth transition among them. The methodology provides mathematically well-defined problems and is also useful to prevent us from obtaining spurious solutions. Novel information criteria are proposed to help the user in choosing parameters. The interest of the proposed methodology is illustrated through simulation studies and a real-data application on COVID data.Ministerio de Economía y Competitividad (grant MTM2017-86061-C2-1-P)Junta de Castilla y León - FEDER (grants VA005P17 and VA002G18)CRoNoS COST y el proyecto “Estadísticas para la detección de fraudes, con aplicaciones para datos comerciales y estados financieros ”de la Universidad de Parma (grant IC1408)Publicación en abierto financiada por el Consorcio de Bibliotecas Universitarias de Castilla y León (BUCLE), con cargo al Programa Operativo 2014ES16RFOP009 FEDER 2014-2020 DE CASTILLA Y LEÓN, Actuación:20007-CL - Apoyo Consorcio BUCL

    tclust: An R Package for a Trimming Approach to Cluster Analysis

    Get PDF
    Outlying data can heavily influence standard clustering methods. At the same time, clustering principles can be useful when robustifying statistical procedures. These two reasons motivate the development of feasible robust model-based clustering approaches. With this in mind, an R package for performing non-hierarchical robust clustering, called tclust, is presented here. Instead of trying to “fit” noisy data, a proportion α of the most outlying observations is trimmed. The tclust package efficiently handles different cluster scatter constraints. Graphical exploratory tools are also provided to help the user make sensible choices for the trimming proportion as well as the number of clusters to search for

    Fuzzy Clustering Throug Robust Factor Analyzers

    Get PDF
    Producción CientíficaIn fuzzy clustering, data elements can belong to more than one cluster , and membership levels are associated with each element, to indicate the strength of the association between that data element and a particular cluster. Unfortunately, fuzzy clustering is not robust, while in real applications the data is contaminated by outliers and noise, and the assumed underlying Gaussian distributions could be unrealistic. Here we propose a robust fuzzy estimator for clustering through Factor Analyzers, by introducing the joint usage of trimming and of constrained estimation of noise matrices in the classic Maximum Likelihood approach

    Graphical and computational tools to guide parameter choice for the cluster weighted robust model

    Get PDF
    The Cluster Weighted Robust Model (CWRM) is a recently introduced methodology to robustly estimate mixtures of regressions with random covariates. The CWRM allows users to flexibly perform regression clustering, safeguarding it against data contamination and spurious solutions. Nonetheless, the resulting solution depends on the chosen number of components in the mixture, the percentage of impartial trimming, the degree of heteroscedasticity of the errors around the regression lines and of the clusters in the explanatory variables. Therefore an appropriate model selection is crucially required. Such a complex modeling task may generate several “legitimate” solutions: each one derived from a distinct hyper-parameters specification. The present paper introduces a two step-monitoring procedure to help users effectively explore such a vast model space. The first phase uncovers the most appropriate percentages of trimming, whilst the second phase explores the whole set of solutions, conditioning on the outcome derived from the previous step. The final output singles out a set of “top” solutions, whose optimality, stability and validity is assessed. Novel graphical and computational tools - specifically tailored for the CWRM framework - will help the user make an educated choice among the optimal solutions. Three examples on real datasets showcase our proposal in action. Supplementary files for this article are available online

    A Fuzzy Approach to Robust Clusterwise Regression

    Get PDF
    new robust fuzzy linear clustering method is proposed. We estimate coe cients of a linear regression model in each unknown cluster. Our method aims to achieve robustness by trimming a xed proportion of observations. Assignments to clusters are fuzzy: observations contribute to estimates in more than one single cluster. We describe general criteria for tuning the method. The proposed method seems to be robust with respect to di erent types of contamination

    Finding the Number of Groups in Model-Based Clustering via Constrained Likelihoods

    Get PDF
    Deciding the number of clusters k is one of the most difficult problems in Cluster Analysis. For this purpose, complexity-penalized likelihood approaches have been introduced in model-based clustering, such as the well known BIC and ICL criteria. However, the classification/mixture likelihoods considered in these approaches are unbounded without any constraint on the cluster scatter matrices. Constraints also prevent traditional EM and CEM algorithms from being trapped in (spurious) local maxima. Controlling the maximal ratio between the eigenvalues of the scatter matrices to be smaller than a fixed constant c ≥ 1 is a sensible idea for setting such constraints. A new penalized likelihood criterion which takes into account the higher model complexity that a higher value of c entails, is proposed. Based on this criterion, a novel and fully automatized procedure, leading to a small ranked list of optimal (k; c) couples is provided. Its performance is assessed both in empirical examples and through a simulation study as a function of cluster overlap

    A Reweighting Approach to Robust Clustering

    Get PDF
    An iteratively reweighted approach for robust clustering is presented in this work. The method is initialized with a very robust clustering partition based on an high trimming level. The initial partition is then refined to reduce the number of wrongly discarded observations and substantially increase efficiency. Simulation studies and real data examples indicate that the final clustering solution is both robust and efficient, and naturally adapts to the true underlying contamination level
    corecore